{ "cells": [ { "cell_type": "markdown", "id": "547d2fc3-9e4a-429c-9e7b-d229648338fa", "metadata": {}, "source": [ "# PUMS Demo" ] }, { "cell_type": "markdown", "id": "b51e1e17-a42d-4fd0-8647-aa47264f12bc", "metadata": {}, "source": [ "## Introduction\n", "\n", "This notebook demonstrates how to load \n", "[US Census American Community Survey (ACS) Public-Use Microdata Samples](https://www.census.gov/data/developers/data-sets/census-microdata-api.ACS_5-Year_PUMS.html). The process is very much parallel to how we loaded and used\n", "[US Census redistricting data](https://www.census.gov/programs-surveys/decennial-census/about/rdo.html)\n", "in the \n", "[SoMa DIS Demo](https://github.com/vengroff/censusdis/blob/main/notebooks/SoMa%20DIS%20Demo.ipynb)\n", "and \n", "[Seeing White](https://github.com/vengroff/censusdis/blob/main/notebooks/Seeing%20White.ipynb)\n", "notebooks." ] }, { "cell_type": "markdown", "id": "82b1b1d2-24f1-4e46-89ad-9c1401ee9f2f", "metadata": {}, "source": [ "## Imports and configuration" ] }, { "cell_type": "code", "execution_count": 1, "id": "0eb570b7-9c2c-4142-b003-f6408b8d4254", "metadata": { "tags": [] }, "outputs": [], "source": [ "import censusdis.data as ced\n", "import censusdis.geography as cgeo\n", "import censusdis.states\n", "from censusdis.maps import ShapeReader" ] }, { "cell_type": "code", "execution_count": 2, "id": "1f5c31a8-47c1-4f2d-99a2-c0477b121216", "metadata": {}, "outputs": [], "source": [ "# Set your API key here.\n", "CENSUS_API_KEY = None" ] }, { "cell_type": "code", "execution_count": 3, "id": "81ec3c0f-badb-4488-b538-f26e686d2c7b", "metadata": {}, "outputs": [], "source": [ "YEAR = 2020\n", "DATASET = \"acs/acs5/pums\"" ] }, { "cell_type": "code", "execution_count": 4, "id": "2a663fe7-23be-48ce-86dd-2e584e050277", "metadata": {}, "outputs": [], "source": [ "STATE = censusdis.states.MA" ] }, { "cell_type": "markdown", "id": "25d16281-87a9-40cd-a2ba-d3a344f94002", "metadata": {}, "source": [ "## Query Metadata\n", "\n", "First we will see what variables are avialable in the dataset." ] }, { "cell_type": "code", "execution_count": 5, "id": "f7040c6c-9ce2-4fee-97e0-e3f3d8ce4b6b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['WKW', 'FBATHP', 'DRIVESP', 'WGTP23', 'WGTP22', 'WGTP25', 'WGTP24', 'RACNH', 'FWATP', 'WGTP21', 'WGTP20', 'WGTP27', 'WGTP26', 'WGTP29', 'WGTP28', 'FBROADBNDP', 'FDRATXP', 'FWKWNP', 'HOTWAT', 'FWKHP', 'FJWDP', 'WORKSTAT', 'FRACP', 'FFULP', 'WGTP34', 'WGTP33', 'WGTP36', 'PINCP', 'WGTP35', 'FPOBP', 'WGTP30', 'WGTP32', 'STOV', 'FMHP', 'WGTP31', 'RACAIAN', 'WGTP38', 'WGTP37', 'WGTP39', 'PUBCOV', 'SRNT', 'SEX', 'WGTP45', 'WGTP44', 'FACCESSP', 'WGTP47', 'DOUT', 'WGTP46', 'WGTP41', 'WGTP40', 'OTHSVCEX', 'WGTP43', 'RACPI', 'WGTP42', 'INDP', 'WGTP49', 'WGTP48', 'PRIVCOV', 'SFN', 'FINTP', 'HUPAC', 'SFR', 'WGTP50', 'FBLDP', 'WGTP56', 'WGTP55', 'WGTP58', 'WGTP57', 'WGTP52', 'WGTP51', 'DEAR', 'WGTP54', 'DIS', 'WGTP53', 'ACR', 'VACS', 'FINSP', 'WGTP59', 'FMILPP', 'MARHYP', 'ADJHSG', 'PAP', 'WGTP7', 'PWGTP30', 'WGTP6', 'PWGTP31', 'HINCP', 'WGTP5', 'PWGTP32', 'WKWN', 'WGTP4', 'PWGTP33', 'PWGTP34', 'PWGTP35', 'WGTP9', 'PWGTP36', 'WGTP8', 'WGTP3', 'WGTP2', 'RACWHT', 'WGTP1', 'GASP', 'PWGTP26', 'PWGTP27', 'PWGTP28', 'PWGTP29', 'FBDSP', 'FWKLP', 'FSCHP', 'FMARHTP', 'PWGTP20', 'PWGTP21', 'GRPIP', 'PWGTP22', 'PWGTP23', 'PWGTP24', 'PWGTP25', 'JWTRNS', 'FRNTMP', 'FOTHSVCEXP', 'CIT', 'LAPTOP', 'FMRGIP', 'JWRIP', 'PWGTP15', 'FCOMPOTHXP', 'PWGTP16', 'FSEMP', 'PWGTP17', 'PWGTP18', 'PWGTP19', 'ENG', 'FLANP', 'PWGTP10', 'PWGTP11', 'FCITWP', 'PWGTP12', 'PWGTP13', 'PWGTP14', 'FSSIP', 'FS', 'DIVISION', 'WRK', 'HICOV', 'DRATX', 'SMOCP', 'VPS', 'NWLA', 'FACRP', 'SCIENGRLP', 'FJWTRNSP', 'WGTP12', 'WGTP11', 'WGTP14', 'WGTP13', 'AGS', 'FSTOVP', 'WGTP10', 'WGTP19', 'WGTP16', 'WGTP15', 'WGTP18', 'WGTP17', 'FMARHWP', 'FTELP', 'PWGTP73', 'PWGTP74', 'QTRBIR', 'PWGTP75', 'FPERNP', 'PWGTP76', 'PWGTP77', 'PWGTP78', 'PWGTP79', 'PWGTP70', 'PWGTP71', 'PWGTP72', 'FDISP', 'PWGTP', 'WAGP', 'RWAT', 'PWGTP62', 'PWGTP63', 'PWGTP64', 'PWGTP65', 'PWGTP66', 'PWGTP67', 'PWGTP68', 'PWGTP69', 'FPLMPRP', 'TAXAMT', 'ANC2P', 'NWLK', 'PWGTP60', 'FPUBCOVP', 'PWGTP61', 'REGION', 'FMARHYP', 'FRETP', 'SMP', 'PWGTP59', 'FTENP', 'BLD', 'MAR', 'SMX', 'FVEHP', 'PWGTP51', 'PWGTP52', 'PWGTP53', 'PWGTP54', 'PWGTP55', 'GASFP', 'PWGTP56', 'PWGTP57', 'PWGTP58', 'VALP', 'PWGTP50', 'FDEYEP', 'FCITP', 'RACNUM', 'PWGTP48', 'PWGTP49', 'SPORDER', 'FANCP', 'PWGTP40', 'PWGTP41', 'DRAT', 'PWGTP42', 'PWGTP43', 'MRGI', 'ESP', 'PWGTP44', 'WGTP', 'PWGTP45', 'OCCP', 'ESR', 'PWGTP46', 'PWGTP47', 'FMRGP', 'MRGP', 'COW', 'TABLET', 'MRGX', 'MULTG', 'MRGT', 'PWGTP37', 'FOCCP', 'PWGTP38', 'PWGTP39', 'FREFRP', 'WGTP61', 'WGTP60', 'WGTP67', 'WGTP66', 'WGTP69', 'WGTP68', 'WGTP63', 'FRWATP', 'WGTP62', 'WGTP65', 'WGTP64', 'RAC1P', 'RNTM', 'DREM', 'MIGSP', 'FHICOVP', 'NWRE', 'FDIALUPP', 'RNTP', 'WGTP70', 'HUPAOC', 'WGTP72', 'WGTP71', 'WGTP78', 'MV', 'WGTP77', 'WGTP79', 'WGTP74', 'WGTP73', 'WGTP76', 'WGTP75', 'FAGSP', 'ANC', 'OIP', 'WGTP80', 'NP', 'NR', 'LNGI', 'ANC1P', 'HISPEED', 'PLM', 'RAC3P', 'OC', 'LANP', 'FLAPTOPP', 'FPRIVCOVP', 'LANX', 'HUPARC', 'FMVP', 'SVAL', 'PWGTP80', 'RWATPR', 'FWKWP', 'RAC2P', 'FPOWSP', 'SSP', 'R18', 'NAICSP', 'WATFP', 'RMSP', 'FDRATP', 'BDSP', 'FGASP', 'RESMODE', 'FSEXP', 'MHP', 'REFR', 'POWSP', 'FPLMP', 'JWAP', 'DIALUP', 'ELEFP', 'RELSHIPP', 'MIG', 'RC', 'MIL', 'WAOB', 'FHINS7P', 'RT', 'INTP', 'RACASN', 'FRELSHIPP', 'FHISPEEDP', 'FINDP', 'FPAP', 'JWMNP', 'FHINS6P', 'FKITP', 'INSP', 'ST', 'FFINCP', 'YBL', 'FINCP', 'YOEP', 'HINS6', 'FSMXHP', 'HINS7', 'FTABLETP', 'FLANXP', 'GRNTP', 'FCOWP', 'FJWMNP', 'HINS1', 'HINS2', 'HINS3', 'HINS4', 'HINS5', 'FDDRSP', 'FWAGP', 'FGRNTP', 'PUMA', 'FHINS5P', 'FSCHGP', 'FHINS5C', 'FVACSP', 'FHOTWATP', 'R60', 'FMARHDP', 'FMIGSP', 'R65', 'MARHT', 'FHINS4P', 'SERIALNO', 'DECADE', 'MARHW', 'MARHM', 'FGCRP', 'FHINS4C', 'PSF', 'RACSOR', 'NOC', 'POBP', 'CPLT', 'TYPEHUGQ', 'NOP', 'FSMARTPHONP', 'FULP', 'FHINS3P', 'SINK', 'KIT', 'FVALP', 'CONCAT_ID', 'JWDP', 'WKEXREL', 'DPHY', 'FWRKP', 'NPF', 'FHINS3C', 'FDEARP', 'NPP', 'FHINS2P', 'FRWATPRP', 'FSCHLP', 'MLPCD', 'FPARC', 'MIGPUMA', 'HUGCL', 'DDRS', 'MARHD', 'FMRGXP', 'SATELLITE', 'POVPIP', 'FULFP', 'BATH', 'WATP', 'GCM', 'FSATELLITEP', 'GCL', 'FHINS1P', 'WKHP', 'RECORD_TYPE', 'GCR', 'FRNTP', 'NRC', 'HFL', 'FSMP', 'ACCESSINET', 'ADJINC', 'FRMSP', 'POWPUMA', 'PARTNER', 'FELEP', 'CONP', 'FMIGP', 'FFSP', 'FGCMP', 'HISP', 'FESRP', 'HHL', 'AGEP', 'DEYE', 'SEMP', 'HHT', 'OCPIP', 'FGCLP', 'FENGP', 'SCHG', 'FTAXP', 'MSP', 'RACBLK', 'FMRGTP', 'FAGEP', 'SCHL', 'PWGTP9', 'FER', 'MLPFG', 'PWGTP8', 'NATIVITY', 'PWGTP7', 'SMARTPHONE', 'PWGTP6', 'FES', 'PWGTP5', 'PWGTP4', 'VEH', 'PWGTP3', 'FFERP', 'PWGTP2', 'PWGTP1', 'HHT2', 'FYOEP', 'FDREMP', 'NWAB', 'FCONP', 'FYBLP', 'FOIP', 'FHINCP', 'FHISP', 'BROADBND', 'NWAV', 'FDPHYP', 'PAOC', 'FFODP', 'FMARHMP', 'SSIP', 'FJWRIP', 'COMPOTHX', 'FSINKP', 'ELEP', 'FHFLP', 'WIF', 'FSMXSP', 'FOD2P', 'RETP', 'FSSP', 'PLMPRP', 'SCIENGP', 'CITWP', 'FMILSP', 'FPINCP', 'FOD1P', 'FMARP', 'SOCP', 'MLPE', 'MLPH', 'MLPA', 'FSMOCP', 'MLPB', 'FDOUTP', 'PERNP', 'WKL', 'SCH', 'TEL', 'TEN', 'MLPI', 'MLPJ', 'MLPK'])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "group = ced.variables.get_group(DATASET, YEAR, None)\n", "\n", "group.keys()" ] }, { "cell_type": "code", "execution_count": 6, "id": "9da6fb7c-c3c7-40f0-a0fa-12a7abc74eb1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'label': 'Age',\n", " 'predicateType': 'int',\n", " 'group': 'N/A',\n", " 'limit': 0,\n", " 'suggested-weight': 'PWGTP',\n", " 'values': {'item': {'0': 'Under 1 year'},\n", " 'range': [{'min': '1',\n", " 'max': '99',\n", " 'description': '1 to 99 years (Top-coded)'}]},\n", " 'name': 'AGEP'}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "group[\"AGEP\"]" ] }, { "cell_type": "markdown", "id": "7f2ee674-0e68-4f92-a6ed-ce06432a6077", "metadata": {}, "source": [ "Next we will see what geographies are available. Note that PUMS data is available in a lot fewer\n", "geography hierarchies than the full ACS5 data set." ] }, { "cell_type": "code", "execution_count": 7, "id": "d1d12070-6e8c-4092-b4c9-bd78c900cd6b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'020': ['region'],\n", " '030': ['division'],\n", " '040': ['state'],\n", " '795': ['state', 'public_use_microdata_area']}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cgeo.geo_path_snake_specs(DATASET, YEAR)" ] }, { "cell_type": "markdown", "id": "f37165db-4da9-4aff-a0c6-d27c7c1a7247", "metadata": {}, "source": [ "## Query Age and its Suggested Weight at the PUMA Level" ] }, { "cell_type": "code", "execution_count": 8, "id": "2ec8e899-6d50-4fd2-b153-6e0c141b6024", "metadata": {}, "outputs": [], "source": [ "query_variables = [\"AGEP\"]" ] }, { "cell_type": "code", "execution_count": 9, "id": "bec0fa6b-8926-4394-83d9-0c02512fa9cf", "metadata": {}, "outputs": [], "source": [ "variable_weights = {\n", " variable: group[variable][\"suggested-weight\"] for variable in query_variables\n", "}\n", "\n", "unique_weights = list(set(variable_weights.values()))" ] }, { "cell_type": "code", "execution_count": 10, "id": "3fbf90d9-ea15-4157-8037-d819f618b4a3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | STATE | \n", "PUBLIC_USE_MICRODATA_AREA | \n", "AGEP | \n", "PWGTP | \n", "
|---|---|---|---|---|
| 0 | \n", "25 | \n", "3900 | \n", "46 | \n", "14 | \n", "
| 1 | \n", "25 | \n", "3900 | \n", "46 | \n", "12 | \n", "
| 2 | \n", "25 | \n", "3900 | \n", "12 | \n", "13 | \n", "
| 3 | \n", "25 | \n", "302 | \n", "52 | \n", "14 | \n", "
| 4 | \n", "25 | \n", "302 | \n", "21 | \n", "17 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 335405 | \n", "25 | \n", "506 | \n", "65 | \n", "12 | \n", "
| 335406 | \n", "25 | \n", "4301 | \n", "89 | \n", "20 | \n", "
| 335407 | \n", "25 | \n", "400 | \n", "51 | \n", "5 | \n", "
| 335408 | \n", "25 | \n", "400 | \n", "56 | \n", "6 | \n", "
| 335409 | \n", "25 | \n", "400 | \n", "5 | \n", "6 | \n", "
335410 rows × 4 columns
\n", "